Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add reading grib files for stats #33

Open
wants to merge 68 commits into
base: main
Choose a base branch
from
Open

Add reading grib files for stats #33

wants to merge 68 commits into from

Conversation

andleuth
Copy link

@andleuth andleuth commented Jul 19, 2024

  • add reading grib files for stats computation
  • use pinned environment for CI
  • make conda necessary for environment creation

@stelliom
Copy link
Collaborator

stelliom commented Jul 25, 2024

Hi @andleuth,

I looked at the code and added some comments (see above). I also tried to follow the usual ICON+probtest manual workflow (see here) with this branch, however, I am encountering some errors when I try to generate the stats, both with (for the GPU results to check) and without (for the CPU references) the --no-ensemble option. The error(s) I am getting is the following:

[mch_gpu_mixed] mstellio@balfrin-ln002:/scratch/mch/mstellio/test-pt-grib/gpu> python ../externals/probtest/probtest.py stats --no-ensemble --stats-file-name stats_gpu.csv --model-output-dir $SCRATCH/test-pt-grib/gpu/experiments/mch_icon-ch1
initialized logger with level INFO
reading config file from probtest.json
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ECCODES ERROR   :  Unable to find accessor startStepUnit
ECCODES ERROR   :  Unable to get startStep as long (Key/value not found)
ecCodes provides no latitudes/longitudes for gridType='unstructured_grid'
Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1475, in _construct_dataarray
    variable = self._variables[name]
               ~~~~~~~~~~~~~~~^^^^^^
KeyError: 'step'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1574, in __getitem__
    return self._construct_dataarray(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1477, in _construct_dataarray
    _, name, variable = _get_virtual_variable(self._variables, name, self.sizes)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 210, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'step'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/mch/mstellio/test-pt-grib/gpu/../externals/probtest/probtest.py", line 57, in <module>
    cli()
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/engine/stats.py", line 100, in stats
    create_stats_dataframe(
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/engine/stats.py", line 15, in create_stats_dataframe
    df = df_from_file_ids(file_id, input_dir, file_specification)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/dataframe_ops.py", line 135, in df_from_file_ids
    var_df = read_input_file(
             ^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/dataframe_ops.py", line 70, in read_input_file
    var_dfs = file_parser(label, file_name, specification)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/model_output_parser.py", line 90, in parse_grib
    sub_df = dataframe_from_ncfile(
             ^^^^^^^^^^^^^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/util/model_output_parser.py", line 236, in dataframe_from_ncfile
    time = xarray_ds[time_dim].values
           ~~~~~~~~~^^^^^^^^^^
  File "/scratch/mch/mstellio/test-pt-grib/externals/probtest/miniconda/envs/probtest/lib/python3.12/site-packages/xarray/core/dataset.py", line 1576, in __getitem__
    raise KeyError(
KeyError: "No variable named 'step'. Variables on the dataset include ['number', 'time', 'isobaricInhPa', 'z']"

I am not sure if you already encountered something similar? If yes, can you recall what the solution was? This reminds me of the issue you had with the steps' units.

PS: I tested this on mch_icon-ch1 with 1-hourly output.

@andleuth
Copy link
Author

Hi @stelliom , the error occured because I did not activate the conda environment before setting the ECCODES_DEFINITION_PATH. Thanks for your hint!

@jonasjucker
Copy link
Collaborator

A small comment:

It would be nice to have that new feature covered with a unittest.

setup_env.sh Outdated Show resolved Hide resolved
@stelliom
Copy link
Collaborator

stelliom commented Aug 7, 2024

Now, I propose these as next steps:

  • Duplicate the icon-ch1_small experiment, but change the filetype of the output to GRIB, such that we have an icon-ch1_small experiment (NetCDF output) and an icon-ch1_small_grb experiment (GRIB output).
  • Add this new icon-ch1_small_grb experiment to BuildBot, basically by looking in all places in ICON (BB related scripts) where the original icon-ch1_small is present and mimicking that code for icon-ch1_small_grb. However, keep in mind that for icon-ch1_small_grb the file_ids will need to be different (i.e. GRIB and not NetCDF)
  • Add the activation and deactivation of probtest's Conda environment before and after calls to probtest in ICON (mostly BB scripts probably), respectively.
  • Once the above are done, open a MR in icon-nwp and comment runBB on that MR. This way we should be able to see if the test is being run correctly by BB. After this is working we can start looking into the reference generation part.



# Setting ECCODES_DEFINITION_PATH:
${CONDA} activate ${ENV_NAME}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this also tested with the -m option? I.e. where CONDA=mamba.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I don't think so. so maybe better to remove this option?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmmh, I am not sure if we ever use mamba for probtest. I guess that is coming from the blueprint, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Judging from line 53 of this file, ${CONDA} activate/deactivate is probably an issue when using mamba, so we should at least change those commands to just conda activate/deactivate. The rest is probably fine, judging from the code above.

However, if we don't care about mamba here, then I agree that we could actually remove it entirely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it comes from the blueprint. Then lets keep it lean and remove the option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok 👍

@@ -11,6 +11,7 @@
"member_type": "{{member_type}}",
"factor": 5,
"file_specification": [{
"GRIB": {"format": "GRIB", "time_dim": "step", "horizontal_dims": ["values"], "var_excl": ["tlon", "tlat", "vlon", "vlat", "ulon", "ulat", "h", "slor", "anor", "isor", "sdor"], "fill_value_key": "_FillValue"},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at some point we may want to come up with a more flexible solution for var_excl, so that we don't have to hardcode it here. But I think this is not in the scope of this PR 👍

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open for suggestions. Maybe worth to raise an issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot remember now what the actual error was if those variables were not excluded, but at the time we were pretty sure that we did not need to test those values, since they seemed to be dimensions/coordinates. If I remember correctly from what Andrea told me, in GRIB there is no distinction between variables and dimensions/coordinates as in NetCDF, do you know if this is true?

I think that the names of these variables are probably not going to be the same in every GRIB file, so the current solution might be very specific to our needs and therefore pretty fragile in case someone wants to use probtest outside ICON or even with different experiments. If we could find a way to distinguish between variables and these "dimensional fields" within the code, then we could simply fill the short_names_excl list using such criterion. I am not an expert with GRIB files, but I guess these variables (judging from their names) probably do not change with time, therefore they might not have a time dimension. Maybe we could leverage this and use it as a criterion to exclude them from probtest, assuming we are never interested in testing variables that are constant in time, which seems reasonable to me. Another uglier option could be to use whatever error those variables were causing to our advantage by handling it in a try-statement and adding the name of the variable to the short_names_excl list in case of error.

I think we could raise an issue for this, but maybe it is not super urgent. I guess it depends how much the list of excluded variables changes from file to file and how many different people will attempt to use probtest of GRIB files.

Sorry for the wall of text 😁

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the wall of text :)
I am not a grib expert, but I would assume it is true. We had a similar problem with the fof files, and there we just included files that have the specific horizontal dimensions, which excludes all the dimension variables.
But I think for now I would just put it all in a desperate issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree! Will you open the issue or should I do it?

@@ -185,6 +192,58 @@ def unify_time_index(fid_dfs):
return fid_dfs_out


def adjust_time_index(fid_dfs):
Copy link
Collaborator

@stelliom stelliom Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this required as part of this PR? I.e. for reading GRIB files.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well. It is a workaround. I am not 100% satisfied with it. So there is unify_time_index which converts the times to integer timesteps. This works not for the case of grib files that are written every 10s. So I added this adjust_time_index routine which will fix it and just overwrites with the time steps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, ok. However, I am not sure I understand why the fact that GRIB files are written every 10s causes an issue, can you not just replace the times with integer timesteps as well as for NetCDF?

stelliom
stelliom previously approved these changes Oct 28, 2024
Copy link
Collaborator

@stelliom stelliom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @huppd,

I looked at your changes compared to what I already reviewed for Andrea. I added a couple of comments/doubts, but in general this looks fine to me. However, I have to admit I am not super sure about the Python tests, since I am not an expert on that.

I will approve the PR now, since I will be on vacation for the rest of the week. This way you will still be able to merge in my absence. Otherwise, I will come back to this next Monday 🙂

Cheers,
Mikael

@huppd
Copy link
Collaborator

huppd commented Nov 7, 2024

Hi @stelliom
thanks a lot for the valuable review.
I hope I could address your remarks properly.
Cheers,
Daniel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants